我们解决了用草图和文本查询检索图像的问题。我们提出任务形成器(文本和草图变压器),这是一种可使用文本说明和草图作为输入的端到端训练模型。我们认为,两种输入方式都以一种单独的方式无法轻易实现的方式相互补充。任务形成器遵循延迟融合双编码方法,类似于剪辑,该方法允许有效且可扩展的检索,因为检索集可以独立于查询而独立于索引。我们从经验上证明,与传统的基于文本的图像检索相比,除文本外,使用输入草图(甚至是绘制的草图)大大增加了检索召回。为了评估我们的方法,我们在可可数据集的测试集中收集了5,000个手绘草图。收集的草图可获得https://janesjanes.github.io/tsbir/。
translated by 谷歌翻译
Figure 1: We introduce datasets for 3D tracking and motion forecasting with rich maps for autonomous driving. Our 3D tracking dataset contains sequences of LiDAR measurements, 360 • RGB video, front-facing stereo (middle-right), and 6-dof localization. All sequences are aligned with maps containing lane center lines (magenta), driveable region (orange), and ground height. Sequences are annotated with 3D cuboid tracks (green). A wider map view is shown in the bottom-right.
translated by 谷歌翻译